[BugFix][NPU] Honor prefer_model_sampler in NPU AR runner by gcanlin · Pull Request #3517 · vllm-project/vllm-omni

gcanlin · 2026-05-11T16:12:50Z

Summary

Fix #3503: HunyuanImage3 AR output on NPU is missing the <recaption> opening tag after </think>, which breaks the downstream DiT stage. The same code path (prefer_model_sampler) is also used by CosyVoice3, so this fix benefits any model that opts into custom sampling.

Root Cause

HunyuanImage3 declares prefer_model_sampler = True and implements a custom sample() method that ports the official _StageTransitionLogitsProcessor. After </think>, it overrides logits to force <recaption> (and analogous transitions for </recaption>).

The GPU AR runner honors this contract at gpu_ar_model_runner.py::_sample:

if logits is not None and callable(model_sample) and \
        getattr(self.model, "prefer_model_sampler", False):
    sampler_output = model_sample(
        logits,
        self._sampling_metadata_for_model_sampler(sampling_metadata),
    )
    if sampler_output is not None:
        return sampler_output

NPUARModelRunner had no such override. It inherited vllm-ascend.NPUModelRunner._sample, which is unconditionally:

return self.sampler(logits=logits, sampling_metadata=sampling_metadata)

So model.sample() was never called on NPU — the stage transition forcing logic was completely bypassed. With sampling temperature=0.6 / top_p=0.95 / top_k=1024, the model would freely sample whatever after </think>, which on Chinese prompts almost always landed on actual text (请...) instead of the <recaption> token.

This is purely an integration gap, not a sampler-algorithm difference between CUDA and NPU. The dispatch hook was added in #1703 for CosyVoice3 and gates on the generic prefer_model_sampler attribute, so HunyuanImage3 (#2713) opted into it for free on GPU. NPU never picked up the same generalization.

Fix

Move _build_model_sampler_output_token_ids and _sampling_metadata_for_model_sampler from GPUARModelRunner to OmniGPUModelRunner. They are pure logic over self.input_batch with no device-specific code.

The MRO for OmniNPUModelRunner is OmniNPUModelRunner → OmniGPUModelRunner → NPUModelRunner → ..., so NPU inherits the helpers automatically.
Add a thin _sample override in NPUARModelRunner mirroring the GPU one. On the fall-through (no model sampler, or spec-decode), call super()._sample(...) so NPU keeps its lmhead_tp_enable logits slicing and rejection_sampler path unchanged.

Touched files (+83 / -48):

vllm_omni/worker/gpu_model_runner.py — host the two shared helpers
vllm_omni/worker/gpu_ar_model_runner.py — drop the duplicates
vllm_omni/platforms/npu/worker/npu_ar_model_runner.py — drop unused imports, add _sample override

Future Cleanup

This PR is the minimal correctness fix. Two follow-ups worth doing once we have more prefer_model_sampler users or another platform:

Push _sample itself down to OmniGPUModelRunner. The dispatch logic is identical between GPU and NPU; only the fall-through target differs, and super()._sample(...) resolves correctly via MRO on both sides (GPU → vllm.GPUModelRunner._sample, NPU → vllm-ascend.NPUModelRunner._sample with lmhead_tp_enable slicing and rejection sampler). Holding off here only because the logit-bias call uses self.sampler.logit_bias_state, and we want one more pair of eyes on whether that shape is identical on both platforms before merging the override.
Collapse GPUARModelRunner / NPUARModelRunner duplication. The two files are ~1040 lines each with ~95% structural overlap (_request_final_stage_id, _request_needs_downstream_stage_payload, _resolve_pooler_payload_req_ids, _resolve_req_hidden_states, _maybe_update_prefix_cache, _resolve_global_request_id, the propose_draft_token_ids shape, etc.). Two viable shapes:
- Mixin (ModelSamplerDispatchMixin, etc.): smallest blast radius, both runners mix in. Each new shared concern becomes its own mixin.
- Sibling-merge via NPUARModelRunner(GPUARModelRunner, NPUModelRunner): matches the existing OmniNPUModelRunner shape, but pulls in execute_model / sample_tokens / _capture_talker_mtp_graphs / capture_model (all device-specific) which NPU still has to override — the surface saved isn't worth the silent-regression risk (any new helper added to GPU runner that touches torch.cuda.* would auto-leak to NPU).
Recommend mixin first; revisit sibling-merge if 3+ shared helpers accumulate without device-specific contamination.

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

chatgpt-codex-connector · 2026-05-11T16:12:56Z

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

gcanlin · 2026-05-11T16:14:03Z

@Fishermanykx Could you help test it?

Fishermanykx · 2026-05-13T03:00:58Z

@Fishermanykx Could you help test it?

Test on 8xAscend 64G NPUs

offline

STAGE_CONFIGS_PATH="${STAGE_CONFIGS_PATH:-${REPO_ROOT}/vllm_omni/deploy/hunyuan_image3.yaml}"
OUTPUT_DIR="${OUTPUT_DIR:-${SCRIPT_DIR}/outputs}"
PROMPT="${PROMPT:-新年宠物海报，Q版圆润的可爱标题“新年快乐汪”，副标题“HAPPY NEW YEAR”。 鱼眼镜头，背景是房间门口，近景，上传的主体歪头笑，围着红色围巾，戴着红色毛线帽，高清，绒毛细节，面部特写。 宝丽莱相纸，超现实主义，写实主义，胶片摄影，打印颗粒感肌理。肌理，超写实，复古感。}"
STEPS="${STEPS:-8}"
GUIDANCE_SCALE="${GUIDANCE_SCALE:-1.0}"
SEED="${SEED:-42}"

python "${SCRIPT_DIR}/end2end.py" \
  --model "${MODEL_PATH}" \
  --image-path "${IMAGE_PATH}" \
  --prompts "${PROMPT}" \
  --deploy-config "${STAGE_CONFIGS_PATH}" \
  --output "${OUTPUT_DIR}" \
  --log-stats \
  --modality "img2img" \
  --steps "${STEPS}" \
  --guidance-scale "${GUIDANCE_SCALE}" \
  --seed "${SEED}" \
  --enforce-eager \
  --bot-task "think"

output

[Output] Text:
用户希望将这张可爱的金毛幼犬照片改造成一张充满节日氛围的新年宠物海报。参考图中是一只吐着舌头、歪着头微笑的金毛幼犬，背景是木质地板和模糊的白色花朵。原始指令非常具体，要求添加特定的标题文字、改变背景、调整构图并应用特定的胶片质感。这是一个中等复杂度的任务，因为它涉及到了图像合成、文字排版、风格迁移和构图调整。首先，我需要处理文字部分，将“新年快乐汪”和“HAPPY NEW YEAR”以圆润可爱的字体放置在图像上方。接着，背景需要从户外的木地板切换到室内的门口场景，这需要保持狗狗作为主体的近景特写。为了增强节日感，狗狗需要佩戴红色的毛线帽和红色的围巾。构图上，原始指令提到了鱼眼镜头效果，这意味着画面边缘会有轻微的弧形畸变，增加视觉冲击力。最后，整体风格要模拟宝丽莱相纸的质感，带有胶片颗粒和复古的色调。在改写指令时，我会把这些元素整合在一起，详细描述狗狗的新装扮、背景的变化、文字的样式和位置，以及整体的艺术风格，确保生成的图像既保留了原图狗狗的神态，又具备浓厚的新年海报氛围。</think>将参考图中的金毛幼犬制作成一张新年主题的海报。请保留狗狗歪头吐舌的可爱表情，但为它戴上一顶红色的针织毛线帽和一条配套的红色围巾。将背景从户外的木地板更换为室内的门口场景，狗狗依然保持近景特写。在图像上方添加圆润可爱的艺术字体标题“新年快乐汪”，下方配以较小的“HAPPY NEW YEAR”字样。整体画面采用鱼眼镜头效果，使边缘产生轻微的弧形畸变。最后，为整张图片添加宝丽莱相纸的白色边框，并赋予其胶片摄影的质感，包括细腻的打印颗粒感和复古的暖色调，营造出一种温馨的节日氛围。</recaption>

Still no <recaption> tag

hsliuustc0106 · 2026-05-16T23:11:52Z

fix it asap

gcanlin · 2026-05-17T16:06:50Z

I can reproduce this text on A100 GPU.

@Fishermanykx Could you help test it?

Test on 8xAscend 64G NPUs

offline

STAGE_CONFIGS_PATH="${STAGE_CONFIGS_PATH:-${REPO_ROOT}/vllm_omni/deploy/hunyuan_image3.yaml}"
OUTPUT_DIR="${OUTPUT_DIR:-${SCRIPT_DIR}/outputs}"
PROMPT="${PROMPT:-新年宠物海报，Q版圆润的可爱标题“新年快乐汪”，副标题“HAPPY NEW YEAR”。 鱼眼镜头，背景是房间门口，近景，上传的主体歪头笑，围着红色围巾，戴着红色毛线帽，高清，绒毛细节，面部特写。 宝丽莱相纸，超现实主义，写实主义，胶片摄影，打印颗粒感肌理。肌理，超写实，复古感。}"
STEPS="${STEPS:-8}"
GUIDANCE_SCALE="${GUIDANCE_SCALE:-1.0}"
SEED="${SEED:-42}"

python "${SCRIPT_DIR}/end2end.py" \
  --model "${MODEL_PATH}" \
  --image-path "${IMAGE_PATH}" \
  --prompts "${PROMPT}" \
  --deploy-config "${STAGE_CONFIGS_PATH}" \
  --output "${OUTPUT_DIR}" \
  --log-stats \
  --modality "img2img" \
  --steps "${STEPS}" \
  --guidance-scale "${GUIDANCE_SCALE}" \
  --seed "${SEED}" \
  --enforce-eager \
  --bot-task "think"

output

[Output] Text:
用户希望将这张可爱的金毛幼犬照片改造成一张充满节日氛围的新年宠物海报。参考图中是一只吐着舌头、歪着头微笑的金毛幼犬，背景是木质地板和模糊的白色花朵。原始指令非常具体，要求添加特定的标题文字、改变背景、调整构图并应用特定的胶片质感。这是一个中等复杂度的任务，因为它涉及到了图像合成、文字排版、风格迁移和构图调整。首先，我需要处理文字部分，将“新年快乐汪”和“HAPPY NEW YEAR”以圆润可爱的字体放置在图像上方。接着，背景需要从户外的木地板切换到室内的门口场景，这需要保持狗狗作为主体的近景特写。为了增强节日感，狗狗需要佩戴红色的毛线帽和红色的围巾。构图上，原始指令提到了鱼眼镜头效果，这意味着画面边缘会有轻微的弧形畸变，增加视觉冲击力。最后，整体风格要模拟宝丽莱相纸的质感，带有胶片颗粒和复古的色调。在改写指令时，我会把这些元素整合在一起，详细描述狗狗的新装扮、背景的变化、文字的样式和位置，以及整体的艺术风格，确保生成的图像既保留了原图狗狗的神态，又具备浓厚的新年海报氛围。</think>将参考图中的金毛幼犬制作成一张新年主题的海报。请保留狗狗歪头吐舌的可爱表情，但为它戴上一顶红色的针织毛线帽和一条配套的红色围巾。将背景从户外的木地板更换为室内的门口场景，狗狗依然保持近景特写。在图像上方添加圆润可爱的艺术字体标题“新年快乐汪”，下方配以较小的“HAPPY NEW YEAR”字样。整体画面采用鱼眼镜头效果，使边缘产生轻微的弧形畸变。最后，为整张图片添加宝丽莱相纸的白色边框，并赋予其胶片摄影的质感，包括细腻的打印颗粒感和复古的暖色调，营造出一种温馨的节日氛围。</recaption>

Still no <recaption> tag

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-05-22T07:57:03Z

</think><recaption> appears now.

[Output] Text:
用户希望将参考图中这只可爱的金毛幼犬转化为一张充满节日氛围的新年宠物海报。参考图展示了一只吐着舌头、表情憨态可掬的小狗，背景是木质地板。原始指令非常具体，要求添加特定的标题文字、改变小狗的配饰、调整构图视角以及应用特定的胶片摄影风格。这是一个中等复杂度的任务，因为它涉及了元素添加、属性修改和整体风格的重塑。首先，我需要处理文字部分，将“新年快乐汪”和“HAPPY NEW YEAR”以圆润可爱的字体放置在画面上方。接着，针对小狗的配饰，需要给它戴上一顶红色的毛线帽，并在脖子上围上一条红色的围巾，这能极大地增强新年的喜庆感。在构图上，原始指令提到了鱼眼镜头和房间门口的背景，这意味着视角需要从原本的平视改为略微俯视且带有畸变的效果，背景也应从单纯的木地板扩展到包含门框和室内陈设的房间场景。最后，为了达到复古胶片的效果，我需要描述一种具有颗粒感、色彩柔和且带有宝丽莱相纸质感的视觉风格。通过这些步骤，可以将一张普通的宠物照改造成一张极具感染力的节日海报。</think><recaption>请将这张金毛幼犬的照片改写为一张复古胶片风格的新年宠物海报。在画面上方添加圆润可爱的艺术字体标题“新年快乐汪” ，下方配以较小的英文“HAPPY NEW YEAR”。给小狗戴上一顶红色的针织毛线帽，并在脖子上围一条厚实的红色围巾。调整构图视角，采用鱼眼镜头效果，使画面呈现出一种从上往下的俯视感，背景由原本的木地板扩展为房间门口的场景，隐约可见室内的家具和门框。整体画面应具有宝丽莱相纸的质感，带有细腻的胶片颗粒感和柔和的复古色调，保留小狗吐舌头、歪头微笑的憨态，确保绒毛细节清晰可见。</recaption>

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin · 2026-05-23T18:42:21Z

@hsliuustc0106 I think we can only run the ready test for this PR. Seems that the failure is not related to this PR.

gcanlin · 2026-05-24T10:12:11Z

@hsliuustc0106 ready CI pass now. Could you please take another look? Thx!

[BugFix][NPU] Honor prefer_model_sampler in NPU AR runner]

a20ee78

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin requested a review from tzhouam as a code owner May 11, 2026 16:12

gcanlin added the ready label to trigger buildkite CI label May 11, 2026

gcanlin changed the title ~~[BugFix][NPU] Honor prefer_model_sampler in NPU AR runner]~~ [BugFix][NPU] Honor prefer_model_sampler in NPU AR runner May 11, 2026

zengchuang-hw mentioned this pull request May 18, 2026

[BugFix] Fix prefer_model_sampler token history in async scheduling #3681

Merged

2 tasks

Gaohan123 added this to the v0.22.0 milestone May 22, 2026

Bounty-hunter mentioned this pull request May 22, 2026

[Bug]: HunyuanImage-3.0 requires manual configuration of stop_token_ids on NPU. #3722

Closed

1 task

gcanlin added 2 commits May 22, 2026 07:28

Merge branch 'main' into pre-sampler

b29e8d9

fix

8b3c64b

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

gcanlin force-pushed the pre-sampler branch from 08bf293 to 8b3c64b Compare May 22, 2026 07:31

fix

50376bf

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

hsliuustc0106 added nightly-test label to trigger buildkite nightly test CI merge-test label to trigger buildkite merge test CI labels May 22, 2026

hsliuustc0106 and others added 2 commits May 22, 2026 17:55

Merge branch 'main' into pre-sampler

922fc1f

fix lint

fd9dc7d

Signed-off-by: gcanlin <canlinguosdu@gmail.com>

hsliuustc0106 removed nightly-test label to trigger buildkite nightly test CI merge-test label to trigger buildkite merge test CI labels May 24, 2026

Merge branch 'main' into pre-sampler

1086290

hsliuustc0106 merged commit c890b3a into vllm-project:main May 25, 2026
8 checks passed

zengchuang-hw mentioned this pull request May 25, 2026

[RFC] HunyuanImage Model Bug Tracking #3731

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][NPU] Honor prefer_model_sampler in NPU AR runner#3517

[BugFix][NPU] Honor prefer_model_sampler in NPU AR runner#3517
hsliuustc0106 merged 7 commits into
vllm-project:mainfrom
gcanlin:pre-sampler

gcanlin commented May 11, 2026

Uh oh!

chatgpt-codex-connector Bot commented May 11, 2026

Uh oh!

gcanlin commented May 11, 2026

Uh oh!

Fishermanykx commented May 13, 2026 •

edited

Loading

Uh oh!

hsliuustc0106 commented May 16, 2026

Uh oh!

gcanlin commented May 17, 2026

Uh oh!

gcanlin commented May 22, 2026 •

edited

Loading

Uh oh!

gcanlin commented May 23, 2026

Uh oh!

gcanlin commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

gcanlin commented May 11, 2026

Summary

Root Cause

Fix

Future Cleanup

Uh oh!

chatgpt-codex-connector Bot commented May 11, 2026

Uh oh!

gcanlin commented May 11, 2026

Uh oh!

Fishermanykx commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsliuustc0106 commented May 16, 2026

Uh oh!

gcanlin commented May 17, 2026

Uh oh!

gcanlin commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gcanlin commented May 23, 2026

Uh oh!

gcanlin commented May 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fishermanykx commented May 13, 2026 •

edited

Loading

gcanlin commented May 22, 2026 •

edited

Loading